Machine Translation - Corpus Linguistics
نویسنده
چکیده
Due to the importance of communication between two or more people, companies, or even nations, the need for good translators has long been obvious around the world, and since humans are faulty, it is not surprising that many attempts to automate the process of translation have been made throughout history. This is called machine translation. Corpus linguistics, a branch of the machine translation tree, utilises statistical methods to analyse text samples – corpora – and makes conclusions based on the results. The goal of this paper is to identify and account for the problems and possibilities associated with this kind of statistical approach to translation, as well as give a brief view of the history of the topic and glimpse at projects currently in research.
منابع مشابه
Principles of corpus linguistics and their application to translation studies research
Corpora have been put to many different uses in fields as varied as natural language processing, critical discourse analysis and applied linguistics, to mention just a few. As is to be expected, within each of those areas corpora fulfil different roles, from providing data to build statistical machine translation systems to revealing ideological stance in politicallysensitive texts. ‘Corpus lin...
متن کاملBuilding Parallel Corpora for SMT System: A Case Study of English-Manipuri
The Statistical Machine Translation (SMT) systems are developed using sentence aligned parallel corpus. The difficulty is that there is no parallel corpus at the required measure for many language pairs. The preparation of large scale parallel corpus takes time and demands the linguistics skill. In the present work, the various issues of a quality parallel corpus and a technique that extracts p...
متن کاملEVBCorpus - A Multi-Layer English-Vietnamese Bilingual Corpus for Studying Tasks in Comparative Linguistics
Bilingual corpora play an important role as resources not only for machine translation research and development but also for studying tasks in comparative linguistics. Manual annotation of word alignments is of significance to provide a gold-standard for developing and evaluating machine translation models and comparative linguistics tasks. This paper presents research on building an English-Vi...
متن کاملCzEng: Czech-English Parallel Corpus release version 0.5
We introduce CzEng 0.5, a new Czech-English sentence-aligned parallel corpus consisting of around 20 million tokens in either language. The corpus is available on the Internet and can be used under the terms of license agreement for non-commercial educational and research purposes. Besides the description of the corpus, also preliminary results concerning statistical machine translation experim...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کامل